Cache-efficient numerical algorithms using graphics hardware
نویسندگان
چکیده
We present cache-efficient algorithms for scientific computations using graphics processing units (GPUs). Our approach is based on mapping the nested loops in the numerical algorithms to the texture mapping hardware and efficiently utilizing GPU caches. This mapping exploits the inherent parallelism, pipelining and high memory bandwidth on GPUs. We further improve the performance of numerical algorithms by accounting for the same relative memory address accesses performed at data elements in nested loops. Based on the similarity of memory accesses performed at the data elements in the input array, we decompose the input arrays into sub-arrays with similar memory access patterns and execute on the sub-arrays for faster execution. Our approach achieves high memory performance on GPUs by tiling the computation and thereby improving the cache-efficiency. Overall, our formulation for GPU-based algorithms extends the current graphics runtime APIs without exposing the underlying hardware complexity to the programmer. This makes it possible to achieve portability and higher performance across different GPUs. We use this approach to improve the performance of GPU-based sorting, fast Fourier transform and dense matrix multiplication algorithms. We also compare our results with prior GPU-based and CPU-based implementations on high-end processors. In practice, we observe 2–10× improvement in performance.
منابع مشابه
Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملCache and Bandwidth Aware Matrix Multiplication on the GPU
Recent advances in the speed and programmability of consumer level graphics hardware has sparked a flurry of research that goes beyond the realm of image synthesis and computer graphics. We examine the use of the GPU (graphics processing unit) as a tool for scientific computing, by analyzing techniques for performing large matrix multiplies in GPU hardware. An earlier method for multiplying mat...
متن کاملEfficient Hardware for Tile-Based Rasterization
An efficient logic-enhanced memory architecture is presented that solves existing problems associated with 3D graphics tile-based hardware rasterization algorithms. The memory contains the same number of bits as the number of pixels in the tile, and during rasterization time it is filled up in several clock cycles by a systolic primitive scanconversion subsystem with the stencil of the primitiv...
متن کاملNumerical Simulations on PC Graphics Hardware
On recent PC graphics cards, fully programmable parallel geometry and pixel units are available providing powerful instruction sets to perform arithmetic and logical operations. In addition to computational functionality, pixel (fragment) units also provide an efficient memory interface to local graphics data. To take full advantage of this technology, considerable effort has been spent on the ...
متن کاملToward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems
The goal of this paper is to implement an efficient matrix inversion of symmetric positive-definite matrices on heterogeneous GPU-based systems. The matrix inversion procedure can be split into three stages: computing the Cholesky factorization, inverting the Cholesky factor and calculating the product of the inverted Cholesky factor with its transpose to get the final inverted matrix. Using hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 33 شماره
صفحات -
تاریخ انتشار 2007